Generalized Cores 



VLADIMIR BATAGELJ and MATJAZ ZAVERSIMIK 
University of Ljubljana, FMF, Department of Mathematics, 
and IMFM Ljubljana, Department of TCS, 
Jadranska 19, 1000 Ljubljana, Slovenia 

Cores are, besides connectivity components, one among few concepts that provides us with efficient 
decompositions of large graphs and networks. 

In the paper a generalization of the notion of core of a graph based on vertex property function 
is presented. It is shown that for the local monotone vertex property functions the corresponding 
cores can be determined in 0(mmax(A,logn)) time. 
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1. CORES 

The notion of core was introduced by Seidman in 1983 [Seidman 1983; Wasserman 
and Faust 1994]. 

Let G = (V, L) be a simple graph. V is the set of vertices and L is the set of lines 
(edges or arcs). We will denote n = \V\ and m = \L\. A subgraph H = (C,L\C) 
induced by the set C C V is a k-core or a core of order k iff Vu S C : deg H (v) > k 
and H is a maximum subgraph with this property. The core of maximum order is 
also called the main core. The core number of vertex v is the highest order of 
a core that contains this vertex. Since the set C determines the corresponding core 
H we also often call it a core. 

The degree deg(v) can be the number of neighbors in an undirected graph or 
in-degree, out-degree, in-degree + out-degree, . . . determining different types of 
cores. 

The cores have the following important properties: 

— The cores are nested: i < j Hj C Hj 

— Cores are not necessarily connected subgraphs. 

In this paper we present a generalization of the notion of core from degrees to 
other properties of vertices. 
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Fig. 1. 0, 1, 2 and 3 core 

2. P-CORES 

Let N = (V, L, w) be a network, where G = (V, L) is a graph and w : L — > IR is 
a function assigning values to lines. A vertex property function on N, or a p 
function for short, is a function p(v, U), v £ V, U C V with real values. 

Examples of vertex property functions: Let N(v) denotes the set of neighbors 
of vertex v in graph G, and N(v, U) = N(v) D U, U C V. 

(1) p 1 («,£/)=deg l ,(t>) 

(2) P2(v,U) = indeg t7 (w) 

(3) p 3 (v,U) = outdeg a (u) 

(4) p 4 {v, U) = indeg C7 (u) + outdcg c/ (w) 

(5) p 5 (v, U) = J2ueN(v,u) w ( w > M )' where w : L -► Ro 

(6) P6(w, J7) = max ue n(v,U) u), where w : L — > IR 

(7) P7(w, J7) = number of cycles of length fc through vertex w 

The subgraph H = (C, L\C) induced by the set C C 1/ is a p-core at level t g IR 

iff 

— Vw £C:t< p(«, C) 
— C is maximal such set. 

The function p is monotone iff it has the property 

Ci C C 2 => Vw e V : (p(v, Cx) < p(v, C 2 )) 

All among functions pi - ^7 are monotone. 

For monotone function the p-core at level t can be determined by successively 
deleting vertices with value of p lower than t. 

C := V; 

while 3v e C : p(v, C) < t do C := C \ {«}; 

Theorem 2.1. for monotone function p the above procedure determines the p- 
core at level t. 
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Proof. The set C returned by the procedure evidently has the first property 
from the p-core definition. 

Let us also show that for monotone p the result of the procedure is independent 
of the order of deletions. 

Suppose the contrary - there are two different p-cores at level t, determined by 
sets C and D. The core C was produced by deleting the sequence ui, u 2 , W3, . . . , 
u p ; and D by the sequence v\, v 2 , v 3 , . . . , v q . Assume that D\C ^ 0. We will 
show that this leads to contradiction. 

Take any z G D\C. To show that it also can be deleted we first apply the sequence 
v ii v 2, v 3, ■ ■ ■ 1 v q to get D. Since z G D\C it appears in the sequence U\, u 2 , u s , . . . , 
u s = z. Let U — and U; L = C/j_i L){ui}. Then, since Mi G l..p : p(uj, V\f7j_i) < t, 
we have, by monotonicity of p, also Mi G l..p : p{ui, (V \ D) \ J7j_i) < t. Therefore 
also all Ui G D \ C are deleted - D\C = - a contradiction. 

Since the result of the procedure is uniquely defined and vertices outside C have 
p value lower than t, the final set C satisfies also the second condition from the 
definition of p-core - it is the p-core at level t. □ 

COROLLARY 2.2. For monotone function p the cores are nested 

h<t 2 ^ H 42 C H tl 

Proof. Follows directly from the theorem 1. Since the result is independent of 
the order of deletions we first determine the H tl . In the following we eventually 
delete some additional vertices thus producing H t2 . Therefore H t2 C H tl . □ 

Example of nonmonotone p function: Consider the following p function 

ro N(v,u) = d) 

p( v i U) < — / n otherwise 

: \N(v,U) V s 

(. 1 V ' 71 ueN(v,U) 

where w : L — > IRq on the network N = (V, L, w), V — {a, b, c, d, e, /}, 
L (a : b) (b : c) (c : d) (b : e) (e : /) 



4 13 13 

We get different results depending on whether we first delete the vertex b or c (or 
e) - see Figure 2. 

The original network is ap-core at level 2. Applying the algorithm to the network 
we have three choices for the first vertex to be deleted: b, c or e. Deleting 6 we get, 
after removing the isolated vertex a, the p-core C\ — {c, d, e, /} at level 3. Note 
that the values of p in vertices c and e increased from 2 to 3. 

Deleting c (or symmetrically e - we analyze only the first case) we get the set 
C2 = {a, b, e, /} at level 2 - the value at b increased to 2.5. In the next step we can 
delete either the vertex b, producing the set C3 = {e, /} at level 3, or the vertex e, 
producing the p-core C4 = {a, b} at level 4. 

As we see, the result of the algorithm depends on the order of deletions. The 
p-corc at level 4 is not contained in the p-core at level 3. 
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Fig. 2. Nonmonotone p function 

3. ALGORITHMS 

3.1 Algorithm for p-core at level t 

The p function is local iff 

p(v,U)=p{v,N{v,U)) 

The functions pi - pe from examples are local; pi is not local for k > 4. 

In the following we shall assume also that for the function p there exists a constant 
Po such that 



yv E V : p(v, 



Po 



For a local p function an 0(mmax(A,logn)) algorithm for determining p-core 
at level t exists (assuming that p(v, N(v, C)) can be computed in 0(deg c (v))). 

INPUT: graph G = (V, L) represented by lists of neighbors and t 6 IR 
OUTPUT: C C V, C is a p-core at level t 



1. C:=V; 

2. for v e V dop[«] :=p(v,N(v, C)); 

3. build_min_heap(v,p); 

4. while p[<op] < t do begin 
4.1. C*:=C\{top}; 
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4.2. for v £ N(top, C) do begin 

4.2.1. p[v}:=p(v,N(v,C)); 

4.2.2. update Jieap(v,p)\ 
end; 



end; 

The step 4.2.1. can often be speeded up by updating the p[v]. 

This algorithm is straightforwardly extended to produce the hierarchy of p-cores. 
The hierarchy is determined by the core-number assigned to each vertex - the 
highest level value of p-cores that contain the vertex. 

3.2 Determining the hierarchy of p-cores 

INPUT: graph G = (V, L) represented by lists of neighbors 
OUTPUT: table core with core number for each vertex 



1. C:=V: 

2. for v £V do p[v] := p(v, N(v, C)); 

3. build jminJieap(v,p)] 

4. while sizeof(heap) > do begin 

4.1. C:=C\{top}] 

4.2. core[top] := p[top\; 

4.3. for v £ N(top, C) do begin 

4.3.1. p[v] := max {p[top},p(v, N(v, C))}; 

4.3.2. update Jieap(v,p); 
end; 

end; 



Let us assume that P is a maximum time needed for computing the value of 
p(v,U), v £ V, U C V. Then the complexity of statements 1. - 3. is Ti_3 = 
0(n) + O(Pn) + 0(n log n) = 0(n ■ max(P, logn)). Let us now look at the body 
of the while loop. Since at each repetition of the body the size of the set C is 
decreased by 1 there are at most n repetitions. Statements 4.1 and 4.2 can be 
implemented to run in constant time, thus contributing T4.14.2 = 0(n) to the loop. 
In all combined repetitions of the while and for loops each line is considered at 
most once. Therefore the body of the for loop (statements 4.3,1 and 4.3.2) is 
executed at most m times - contributing at most T4.3 = m ■ (P + O(logn)) to 
the while loop. Often the value p(v, N(v, C)) can be updated in constant time - 
P = 0(1). Summing up all the contributions we get the total time complexity of 
the algorithm T = Ti_ 3 + T 4 .i ;4 . 2 + T 4 . 3 = 0(m ■ max(P, logn)). 

For a local p function, for which the value of p(v, N(v, C)) can be computed in 
0(deg c (v))), we have P = 0(A). 

The described algorithm is partially implemented in program for large networks 
analysis Pajek (Slovene word for Spider) for Windows (32 bit) [Batagelj and Mrvar 
1998]. It is freely available, for noncommercial use, at its homepage: 

http : //vlado . f mf . uni-1 j . si/pub/networks/pa j ek/ 
A standalone implementation of the algorithm in C is available at 

http : //www. educa. f mf . uni-1 j . si/dat ana/pub/networks/cores/ 
For the property functions p\ - P4 a quicker 0(m) core determining algorithm can 
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Tabic I. ps-corcs of the Routing Data Network at Different Levels. 
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be developed [Batagelj and Zaversnik 2001]. 

4. EXAMPLE - INTERNET CONNECTIONS 

As an example of application of the proposed algorithm we applied it to the routing 
data on the Internet network. This network was produced from web scanning data 
(May 1999) available from 

http : //www. cs .bell-labs . com/who/ches/map/index .html 
It can be obtained also as a Pajek's NET file from 

http : //vlado . fmf . uni-lj . si/pub/networks/data/web/web . zip 
It has 124 651 vertices, 195 029 arcs (loops were removed), A = 151, and average 
degree is 3.13. The arcs have as values the number of traceroute paths which contain 
the arc. 

Using Pajek implementation of the proposed algorithm on 300 MHz PC we ob- 
tained in 3 seconds the ps-cores segmentation presented in Table I - there are n& 
vertices with ps-core number in the interval (tk-i,tk\- 

The program also determined the ps-core number for every vertex. Figure 3 
shows a ps-core at level 25000 of the Internet network - every vertex inside this 
core is visited by at least 25000 traceroute paths. In the figure the sizes of circles 
representing vertices are proportional to (the square roots of) their p 5 -core numbers. 
Since the arcs values span from 1 to 626826 they can not be displayed directly. We 
recoded them according to the thresholds 1000 • 2 fe ~ 1 , k = 1, 2, 3 . . .. These class 
numbers are represented by the thickness of the arcs. 

5. CONCLUSIONS 

The cores, because they can be efficiently determined, are one among few concepts 
that provide us with meaningful decompositions of large networks [Garey and John- 
son 1979]. We expect that different approaches to the analysis of large networks can 
be built on this basis. For example, the sequence of vertices in sequential coloring 
can be determined by descending order of their core numbers (combined with their 
degrees). We obtain in this basis the following bound on the chromatic number of 
a given graph G 

X(G) < 1 + corc(G) 
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Fig. 3. ps-core of the Routing Data Network. 

Cores can also be used to localize the search for interesting subnetworks in large 
networks [Batagelj et al. 1999; Batagelj and Mrvar 2000]: 

— If it exists, a fc-component is contained in a fc-core. 

— If it exists, a fc-clique is contained in a fc-core. lu(G) < core(G). 
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